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Ensemble theories have received a lot of interest recently as a means 
of explaining a lot of the detailed complexity observed in reality by a 
vastly simpler description "every possibility exists" and a selection princi- 
ple {Anthropic Principle) "we only observe that which is consistent with 
our existence". In this paper I show why, in an ensemble theory of the 
universe, we should be inhabiting one of the elements of that ensemble 
with least information content that satisfies the anthropic principle. This 
explains the effectiveness of aesthetic principles such as Occam's razor in 
predicting usefulness of scientific theories. I also show, with a couple of 
reasonable assumptions about the phenomenon of consciousness, the lin- 
ear structure of quantum mechanics can be derived. 

Key words: Occam's razor, anthropic principle, ensemble theories, multi- 
verse, failure of induction, foundation of quantum mechanics 



1 INTRODUCTION 

Wigner^-*^^ once remarked on "the unreasonable effectiveness of mathematics", 
encapsulating in one phrase the mystery of why the scientific enterprise is so 
successful. There is an aesthetic principle at large, whereby scientific theories 
are chosen according to their beauty, or simplicity. These then must be tested 
by experiment — the surprising thing is that the aesthetic quality of a theory 
is often a good predictor of that theory's explanatory and predictive power. 
This situation is summed up by William of Ockham, "Entities should not be 
multiplied unnecessarily" , known as Occam's Razor. 

We start our search into an explanation of this mystery with the anthropic 
principle'^^\ This is normally cast into either a weak form (that physical reality 
must be consistent with our existence as conscious, self-aware entities) or a 
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strong form (that physical reahty is the way it is because of our existence as 
conscious, self-aware entities). The anthropic principle is remarkable in that 
it generates significant constraints on the form of the univcrsc^'^' . The two 
main explanations for this are the Divine Creator explanation (the universe was 
created deliberately by God to have properties sufficient to support intelligent 
life), or the Ensemble explana,tionf^^ (that there is a set, or ensemble, of different 
universes, differing in details such as physical parameters, constants and even 
laws, however, we are only aware of such universes that are consistent with our 
existence). In the Ensemble explanation, the strong and weak formulations of 
the anthropic principle are equivalent. 

Tegmark introduces an ensemble theory based on the idea that every self- 
consistent mathematical structure be accorded the ontological status of physical 
existence. He then goes on to categorize mathematical structures that have 
been discovered thus far (by humans), and argues that this set should be largely 
universal, in that all self-aware entities should be able to uncover at least the 
most basic of these mathematical structures, and that it is unlikely we have 
overlooked any equally basic mathematical structures. 

An alternative ensemble approach is that of Schmidhuber's'^'^^ — the "Great 
Programmer". This states that all possible halting programs of a universal 
Turing machine have physical existence. Some of these programs' outputs will 
contain self-aware substructures — these are the programs deemed interesting 
by the anthropic principle. Note that there is no need for the UTM to actually 
exist, nor is there any need to specify which UTM is to be used — a program 
that is meaningful on UTMi can be executed on UTM2 by prepending it with 
another program that describes UTMi in terms of UTM2's instructions, then 
executing the individual program. Since the set of halting programs (finite 
length bitstrings) is isomorphic to the set of whole numbers N, an enumeration 
of N is sufficient to generate the ensemble that contains our universe. In a 
later paper^^^ Schmidhuber extends his ensemble to non-halting programs, and 
consider the consequences of assuming that this ensemble is generated by a 
machine with bounded resources. 

Each self-consistent mathematical structure (member of the Tegmark en- 
semble) is completely described by a finite set of symbols, and a countable set 
of axioms encoded in those symbols, and a set of rules (logic) describing how 
one mathematical statement may be converted into another.^ These axioms 
may be encoded as a bitstring, and the rules encoded as a program of a UTM 
that enumerates all possible theorems derived from the axioms, so each mem- 
ber of the Tegmark ensemble may be mapped onto a Schmidhuber one.^. The 

^Strictly speaJjing, these systems are called recursively enumerable formal systems, and are 
only a subset of the totality of mathematics, however this seem in keeping with the spirit of 
Tegmark's suggestion 

^In the case of an infinite number of axioms, the theorems must be enumerated using a 
dovetailer algorithm. The dovetailer algorithm is a means of walking an infinite level tree, 
such that each level is visited in finite time. An example is that for a n-ary tree, the nodes 
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Tegmark ensemble must be contained within the Schmidhuber one. 

An alternative connection between the two ensembles is that the Schmid- 
huber ensemble is a self-consistent mathematical structure, and is therefore an 
element of the Tegmark one. However, all this implies is that one element of 
the ensemble may in fact generate the complete ensemble again, a point made 
by Schmidhuber in that the "Great Programmer" exists many times, over and 
over in a recursive manner within his ensemble. This is now clearly true also of 
the Tegmark ensemble. 

2 UNIVERSAL PRIOR 

In this paper, I adopt a Schmidhuber ensemble consisting of all infinite length 

bitstrings. denoted {0, 1}°°. I call these infinite length strings descriptions. By 
contrast to Schmidhuber, I assume a uniform measure over these descriptions 
— no particular string is more likely than any other. It can be shown that the 
cardinality of {0, 1}°° is the same as the cardinality of the reals, c. This set can- 
not be enumerated by a dovetailer algorithm, rather the dovetailer algorithm 
enumerates all finite length prefixes of these descriptions. Whereas in Schmid- 
huber 's 1997(4) paper, the existence of the dovetailer algorithm explains the ease 
with which the "Great Programmer" can generate the ensemble of universes, I 
merely assume the pre-existence of all possible descriptions. The information 
content of this complete set is precisely zero, as no bits arc specified. It is on- 
tologically equivalent to Nothing. This has been called the "zero information 
principle" . 

Since some of these descriptions describe self aware substructures, we can ask 
the question of what these observers observe. An observer attaches sequences 
of meanings to sequences of prefixes of one of these strings. A meaning belongs 
to a countable set, which may be enumerated by the whole numbers. Thus 
the act of observation may formalised as a map O : [0, 1]°° N. If 0{x) is 
a computable (also known as a recursive) function, then 0{x) is equivalent to 
a Turing machine, for which every input halts. It is important to note that 
observers must be able to evaluate 0{x) within a finite amount of subjective 
time, or the observer simply ceases to be. The restriction to computable 0{x) 
connects this viewpoint with the original viewpoint of Schmidhub(;r. 

Another interpretation of this scenario is a state machine, possibly finite, 
consuming bits of an infinite length string. As each bit is consumed, the current 
state of the machine is the meaning attached to the prefix read so far. 

Under the mapping 0{x), some descriptions encode for identical meanings 
as other descriptions, so one should equivalence class the descriptions. In par- 
ticular, strings where the bits after some bit number n arc "don't care" bits, are 
in fact equivalence classes of all strings that share the first n bits in common. 

on the ith level axe visited between steps n' and n'+^ — 1. 
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One can see that the size of the equivalence class drops off exponentially with 
the amount of information encoded by the string. Under 0{x), the amount of 
information is not necessarily equal to the length of the string, as some of the 
bits may be redundant. The sum 

Po{s)= 2-"' (1) 

p:0{p)=s 

where \p\ means the number of bits of p consumed by O in returning s, gives the 
size of the equivalence class of all descriptions having meaning s This measure 
distribution is known as a universal prior, or alternatively a Solomonoff-Levin 
distribution, in the case where 0{x) is a universal prefix Turing machine^^-*. 
The quantity 

Co{x) = -log2 Po{0{x)) (2) 

is a measure of the information content, or complexity of a description x. If 
only the first n bits of the string are significant, with no redundancy, then it 
is easy to see Co{x) = n. Moreover, if O is a universal prefix Turing machine, 
then the coding theorem*^^-' assures that C{x) » K{x). whore K{x) is the usual 
Kolmogorov complexity, up to a constant independent of the length of x. 

If we assume the self- sampling assumption^'^ ' ^\ essentially that we expect 
to find ourselves in one of the universes with greatest measure, subject to the 
constraints of the anthropic principle. This implies we should find ourselves in 
one of the simplest (in terms of Co) possible universes capable of supporting 
self-aware substructures (SASes). This is the origin of physical law — why we 
live in a mathematical, as opposed to a magical universe. This is why aes- 
thetic principles, and Ockam's razor in particular are so successful at predicting 
good scientific theories. This might also be called the "minimum information 
principle" . 

A final comment to highlight the distinction between this approach and 
Schmidhuber's. Schmidhuber assumes that there is a given universal Turing 
machine U which generates the ensemble we find ourselves in. He even uses the 
term "Great Programmer" to underscore this. Ontologically, this is no more 
difiicult than assuming there is an ultimate theory of everything — ie a final 
set of equations from which all of physics can be derived. Occam's razor is a 
consequence of the resource constraints of U. In my approach, there is no given 
laws or global interpreter. By considering just the resource constraints of the 
observer, even in the case of the ensemble having a uniform measure, Occam's 
razor still applies. 

3 THE WHITE RABBIT PARADOX 

An important criticism leveled at ensemble theories is what John Leslie calls 
the failure of induction^^'^''"^^h If all possible universes exist, then what is to 
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say that our orderly, well-behaved universe won't suddenly start to behave in 
a disordered fashion, such that most inductive predictions would fail in them. 
This problem has also been called the White Rabbit paradox^-'-^^ presumably 
in a literary reference to Lewis Carrol. 

This sort of issue is addressed by consideration of measure. We should not 
worry about the universe running off the rails, provided it is extremely unlikely 
to do so. Note that Leslie uses the term range to mean what we mean by 
measure. At first consideration, it would appear that there are vastly more 
ways for a universe to act strangely, than for it to stay on the straight and 
narrow, hence the paradox. 

Evolution has taught us to be efficient classifiers of patterns, and to be 
robust in the presence of errors. It is important to know the difFcrence between 
a lion and a lion-shaped rock, and to establish that difference in real time. 
Only a finite number of the description's bits are processed by the classifier, the 
remaining being "don't care" bits. Aroimd each compact description is a cloud 
of completely random descriptions considered equivalent by the observer. The 
size of this cloud decreases exponentially with the complexity of the description. 

This requirement imposes a significant condition on 0{x). Formally, each 
connected component of the preimage 0~^{s) must be dense, ie have nonzero 
measure, in the space of descriptions. 

Turing machines in general do not have this property of robustness against 
errors. Single bit errors in the input typically lead to wildly different outcomes. 
However, an artificial neural network, which is a computational model inspired 
by the brain does exhibit this robustness — leading to applications such as 
classifying images in the presence of noisy or extraneous data. 

So what are the chances of the laws of physics breaking down, and of us 
finding ourselves in one of Lewis Carrol's creations? Such a universe will have 
a very complex description — for instance the coalescing of air molecules to 
form a fire breathing dragon would involve the complete specification of the 
states of some 10'^° molecules, an absolutely stupendous amoimt of information, 
compared with the simple specification of the big bang and the laws of physics 
that gave rise to life as we know it. The chance of this happening is equally 
remote, via Eq. (1). 

4 QUANTUM MECHANICS 

In the previous sections, I demonstrate that formal mathematical systems are 
the most compressible, and have highest measure amongst all members of the 
Schmidhuber ensemble. In this work, I explicitly assume the validity of the 
Anthropic Principle, namely that we live in a description that is compatible 
with our own existence. This is by no means a trivial assumption — it is 
entirely possible that we are inhabiting a virtual reality where the laws of the 
observed world needn't be compatible with our existence. However, to date, the 



5 



Anthropic Principle has been found to be valid(2). 

In order to derive consequences of the Anthropic Principle, one needs to 
have a model of consciousness, or at very least some necessary properties that 
conscious observer must exhibit. I will explore the consequences of just two such 
properties of consciousness. 

The first assumption to be made is that observers will find themselves em- 
bedded in a temporal dimension. A Turing machine requires time to separate 
the sequence of states it occupies as it performs a computation. Universal Tur- 
ing machines are models of how humans compute things, so it is possible that all 
conscious observers are capable of universal computation. Yet for our present 
purposes, it is not necessary to assume observers are capable of universal com- 
putation, merely that observers arc embedded in time. 

The second assumption, which is related to Marchal's computational indeter- 
minism^^^^ , is that the simple mathematical description selected from the Schmid- 
huber ensemble describes the evolution of an ensemble of possible experiences. 
The actual world experienced by the observer is selected randomly from this 
ensemble. More accurately, for each possible experience, an observer exists to 
observe that possibility. Since it is impossible to distinguish between these 
observers, the internal experience of that observer is as though it is chosen ran- 
domly from the ensemble of possibilities. This I call the Projection Postulate. 

The reason for this assumption is that it allows for very complex experiences 
to be generated from a very simple process. It is a very generalised form of 
Darwinian evolution, which exhibits extreme simplicity over ex nihilo creation 
explanations of life on Earth. Whilst by no means certain, it does seem that 
a minimum level of complexity of the experienced world is needed to support 
conscious experience of that world according the the anthropic principle. 

This ensemble of possibilities at time t we can denote ijj(t). Ludwig^-'^^''^^-^) 
introduces a rather similar concept of ensemble, which he equivalently calls state 
to make contact with conventional terminology. At this point, nothing has been 
said of the mathematical properties of V-"- I shall now endeavour to show that 
tp is indeed an element from complex Hilbert space, a fact normally assumed as 
an axiom in conventional treatments of Quantum Mechanics. 

The projection postulate can be modeled by a partitioning map A : ijj — > 
{V'ajAta}) where a indexes the allowable range of potential observable values 
corresponding to A, ipa is the subensemble satisfying outcome a and is the 
measure associated with tjja {^g^fJ-a = !)• 

Finally, we assume that the generally accepted axioms of set theory and 
probability theory hold. Whilst the properties of sets are well known, and 
needn't be repeated here, the Kolmogorov probability axioms are^^^ : 

(Al) If A and B are events, then so is the intersection AflB, the union AuB 
and the difference A — B. 

(A2) The sample space S is an event, called the certain event, and the empty 
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set is an event, called the impossible event. 
(A3) To each event E, P{E) G [0, 1] denotes the probability of that event. 
(A4) P{S) = 1. 

(A5) If A n B = 0, then P{A \J B) = P{A) + P{B). 

(A6) For a decreasing sequence D D • • • D A„ • • -of events with An = 

0, we have lim„^oo P{An) = 0. 

Consider now the projection operator V^a] '■ ^ — > acting on a ensemble 
tp £ V, V being the set of all such ensembles, to produce ipa = '^{a}V') where 
a G S' is an outcome of an observation. We have not at this stage assumed that 
'P{a} is linear. Define addition for two distinct outcomes a and b as follows: 

'P{a}+'P{b}=P{aM^ (3) 

from which it follows that 

-Pacs = (4) 

Vaub = VA+VB-VAnB (5) 
PAns = VaVb = VbVa. (6) 

These results extend to continuous sets by replacing the discrete sums by in- 
tegration over the sets with uniform measure. Here, as elsewhere, we use S 
to denote sum or integral respectively as the index variable a is discrete of 
continuous. 

Let the ensemble ip gV = {PaiPI^ C 5*} be a "reference state", correspond- 
ing to the certain event. It encodes information about the whole ensemble. 
Denote the probability of a set of outcomes A c S hy P^{Va'4')- Clearly 

P^iVsi^) = P^W = 1 (7) 

by virtue of (A4). Also, by virtue of Eq. (5) and (A4), 

P^CPa + Pb W = PpCPaiP) + PA'PByj) if A n B = 0. (8) 

Assume that Eq. (8) also holds for A n B ^ and consider the possibility 
that A and B can be identical. Eq. (8) may be written: 

P4,{{arA + hVB)i') = aP^{VAi^) + bP^iPBi^),ya,b e N. (9) 

Thus, the set V naturally extends by means of the addition operator defined by 
Eq. (3) to include all linear combinations of observed states, at minimum over 
the natural numbers. If AdB ^ ^, then P.^{{Va + Pb)^^) may exceed unity, so 
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clearly {Pa + 'PB)'ip is not necessarily a possible observed outcome. How should 
we interpret these new "nonphysical" states? 

At each moment that an observation is possible, an observer faces a choice 
about what observation to make. In the Multiverse, the observer differentiates 
into multiple distinct observers, each with its own measurement basis. In this 
view, there is no preferred hasi,4^'^\ 

The expression P^{{aPA + bVB)^^) must be the measure associated with 
a observers choosing to partition the ensemble into A} and observing an 
outcome in A and b observers choosing to partition the ensemble into {B, B} 
and seeing outcome B. The coefficients a and b must be be drawn from a 
measure distribution over the possible choices of measurement. The most gen- 
eral measure distributions arc complex, therefore the coefficients, in general are 
complex*^ We can comprehend easily what a positive measure means, but 
what about complex measures? What does it mean to have an observer with 
measure —1? It turns out that these non-positive measures correspond to ob- 
servers who chose to examine observables that do not commute with our current 
observable A. For example if A were the observation of an electron's spin along 
the z axis, then the states |+) + |— ) and |-|-) — |— ) give identical outcomes as far 
as A is concerned. However, for another observer choosing to observe the spin 
along the x axis, the two states have opposite outcomes. This is the most general 
way of partitioning the Multiverse amongst observers, and we expect to observe 
the most general mathematical structures compatible with our existence. 

The probability function P can be used to define an inner product as follows. 
Our reference state tp can be expressed as a sum over the projected states 
ip = ^a&s'^{a}^ = SoeS^a- ^* ~ ^{''Pa) be the hnear span of this basis 
set. Then, V^, ^eV, such that = J^aes ^«V'a and ^ = J2aes io.tpa, the inner 
product {(p, ^) is defined by 

{<t>,0 = J2KMii'a). (10) 

It is straightforward to show that this definition has the usual properties of an 
inner product, and that tp is normalized {{tpjip) = 1). The measures /Ja are 
given by 

= (11) 

= 1(^,4)1', 

where ipa = fpa/ \/P^{fpa) is normalised. 

Until now, we haven't used axiom (A6). Consider a sequence of sets of 
outcomes Aq D Ai . . and denote by A C A„Vn the unique maximal subset 
(possibly empty), such that Af\^ An = %. Then the difference 7^^. — Pa is well 
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defined, and so 




(12) 



By axiom (A6), 



lim {{Pa, - Va)^', {Va, - = 0, 



(13) 



n- 



so P^iV' is a Cauchy sequence that converges to 'PaV' € V. Hence V is complete 
under the inner product (10). It follows that V* is complete also, and is therefore 
a Hilbert space. 

The most general form of evolution of tf) in continuous time is given by: 



Some people may think that discreteness of the world's description (ie of the 
Schmidhuber bitstring) must imply a corresponding discreteness in the dimen- 
sions of the world. This is not true. Between any two points on a continuum, 
there are an infinite number of points that can be described by a finite string 
— the set of rational numbers being an obvious, but by no means exhaustive 
example. Continuous systems may be made to operate in a discrete way, elec- 
tronic logic circuits being an obvious example. For the sake of connection with 
conventional quantum mechanics, we will assume that time is continuous. A dis- 
crete time formulation can also be derived, in which case we need a difference 
equation instead of Eq. (14). Other possibilities also exist, such as the rational 
numbers example mentioned before. The theory of time scales^^^^ could provide 
a means of developing these other possibilities. 

Axiom (A3) constrains the form of the evolution operator Ti. Since we 
suppose that tpa is also a solution of Eq. 14 (ic that the act of observation does 
not change the physics of the system), H must be linear. The certain event 
must have probability of 1 at all times, so 







dP^(t)(i/>(t)) 



dt 



= d/dt{yj,i:) 

= {^p,H^p) + {n^p,^p) 
= -n, 



(15) 



i.e. Ti. is i times a Hermitian operator. 
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5 Discussion 



A conventional treatment of quantum mechanics (see eg Shankar^^*^)) intro- 
duces a set of 4-5 postulates that appear mysterious. In this paper, I introduce 
a model of observation based on the idea of selecting actual observations from an 
ensemble of possible observations, and can derive the usual postulates of quan- 
tum mechanics aside from the Correspondence Principle.^ Even the property 
of linearity is needed to allow disjoint observations to take place simultaneously 
in the universe. Weinberg^-*^^' ^^"i experimented with a possible non-linear gen- 
eralisation of quantum mechanics, however found great difhculty in producing 
a theory that satisfied causality. This is probably due to the nonlinear terms 
mixing up the partitioning {i})a,^ia\ over time. It is usually supposed that 
causality'^-' , at least to a certain level of approximation, is a requirement for a 
self-aware substructure to exist. It is therefore interesting, that relatively mild 
assumptions about the nature of SASes, as well as the usual interpretations of 
probability and measure theory lead to a linear theory with the properties we 
know of as quantum mechanics. Thus we have a reversal of the usual ontological 
status between Quantum Mechanics and the Many Worlds Interpretation^'^^h 
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